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ABSTRACT 

In this paper, we study certain geometric and topological 
properties of online social networks using the concept of 
density and geometric vector spaces. "Moi Krug" ("My 
Circle"), a Russian social network that promotes the prin- 
ciple of the "six degrees of separation" and is positioning 
itself as a vehicle for professionals and recruiters seeking 
each others' services, is used as a test vehicle. 

Keywords: Online Social Network, Friend, Density, Met- 
ric Space, Vector Space, Chebyshev Space. 

1. OVERVIEW 

Social networks made their way into the Internet and be- 
came an important (if not the most important) part of 
the socially oriented Web in the beginning of the 21 st cen- 
tury (Friendster 2002 [2], MySpace 2003, Linkedln 2004, 
FaceBook 2004, Yahoo 360° 2005) [5]|9]. 

Initially, the predominant language of the social Web 
was English or the languages based on the Latin- 1 charac- 
ter set, especially Spanish and German. The participation 
of Russian-speaking Web users in these social networks 
was limited due to the relative underdevelopment of the 
Russian segment of the Internet and the fact that most 
Russian teenagers and young adults, who constitute the 
core of social networks, felt uncomfortable, if not unpatri- 
otic, to communicate in the English language. One notice- 
able exception was the bilingual Russian diaspora, mainly 
in the USA, Western Europe, and Israel, which success- 
fully got integrated into the major English-speaking social 
networks and is not the topic of this study. 

The rapid proliferation of accessible high-speed Inter- 
net access in Russia in the early 2000s [I] removed the 
first obstacle, while the emergence of the new generation 
of Russian Web programmers and reasonably cheap Web 
hosting made it possible to develop Russian-language so- 
cial networks, which offered Russian as the default and 
the only interface language. Thus, the new social networks 
formed an isolated cluster limited to Russian-speaking par- 
ticipants, which historically tend to be ethnic Russians or 
the nationals of the C.I.S. and the Baltic states. 

The most sizable Russian online social net- 
works to date are "Odnoklassniki" ("Classmates," 
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//odnoklassniki.ru), "V Kontakte" ("In contact,", 
//vkontakte.ru), and "Moi Krug" ("My Circle," 
//moikrug.ru), further referred to as MKOSN. 
"Odnoklassniki" aims at classmates, schoolmates, for- 
mer coworkers, and "brothers-in-arms," mostly helping 
people to reestablish lost contacts and later switch to 



an alternative mode of communications, such as e-mail 
or phone. It is organized by schools, universities, army 
units, major companies, and — interestingly — popular 
vacation sites. In this sense, it is not a place where 
people socialize, but a "lost-and-found" directory. Until 
recently, "Odnoklassniki" allowed users to post at most 
one photograph to their profiles and did not have the 
concept of "friendjj." 

"V Kontakte" is a recent copycat of FaceBook. It has 
virtually the same functionality, except for the lack of an 
open interface. 

On the contrary, "friendship" is the core concept of 
the MKOSN. The MKOSN introductory page emphasizes 
the principle of the "six degrees of separation," or "six 
handshakes," as it is known in Russia. The network is 
organized not by professional or academic affiliations, but 
by individuals and their "circles." Each MKOSN member 
is surrounded by his or her "first circle" of the immediate 
friends, the "second circle" of the friends of the friends, 
and the "third circle" of the friends of the friends of the 
friends (these three circles are implemented in the MKOSN 
explicitly). 

Despite the organizational structure based on the no- 
tion of proximity rather than on occupation, MKOSN posi- 
tions itself as a vehicle for professionals and recruiters seek- 
ing each others' services. The recruiters and HR specialists 
with hundreds of "friends" in their inner circles have in- 
fluenced the network's statistics. However, the magnitude 
of this influence cannot be easily estimated. 

MKOSN is a relatively "young" network. It has been 
put to service in November 2005. To the best of our knowl- 
edge, as of July 2007 the network has approximately 166 
thousand members (see Appendix \K§ . This makes it a 
unique testing ground for various network exploration al- 
gorithms: because of the network's small size, it is possible 
to apply slow algorithms (such as 0(N 2 ) and even NP- 
complete) to the entire network, rather than to its subset, 
thus avoiding the burden of proving that the selected sub- 
set adequately represents the network. 

This paper studies the internal structure and the pecu- 
liarities of "Moi Krug," both from the mathematical and 
psychological points of view. 

2. MACROSCOPIC PARAMETERS 

A typical online social network consists of a giant core 
r — a subnetwork that contains the majority of connected 



The concept of "Odnoklassniki" changed substantially in the 
Winter 2007—2008: despite poor interaction support, the site now 
has all essential features of a social network. 




Figure 1: Node degree distribution in MKOSN is a double 
Pareto distribution. The £-axis is the degree and the y- 
axis is the number of nodes at this rank. 



members — and smaller marginal components not con- 
nected to the core and to each other [10]. At the moment of 
acquisition, the MKOSN was a tiny social network: it had 
only 166 thousand nodes in the giant core. This number 
is consistent with the estimate made in [8] several months 
earlier (0.1 mln. users). 

Various macroscopic parameters of online social net- 
works have been analyzed, e.g., in [3] and [5]- 

Node Degree 

One of the most fundamental properties of a non-directed 
graph is the node degree distribution. It has been observed 
that the node degrees of major social networks are dis- 
tributed according to the double Pareto law. The MKOSN 
node distribution is not an exception (Figure [l}. 

Reed [7] suggests that the break point in the double 
Pareto distribution is due to the fact that the age of the 
observed nodes is distributed exponentially: the "young" 
nodes are on the left, and the "old" nodes are on the right. 
(If the nodes' ages were distributed uniformly, all nodes 
would be "young," and the distribution would follow the 
power law.) The fact that the MKOSN degree distribution 
has a break point may be an indication of the presence of 
at least two generations of nodes and members: "senior" 
members (those with 25 or more "friends;" these members 
have been in existence, say, since the establishment of the 
social network), and "junior" members. According to Fig- 
ure[l] the "senior" members constitute «3.6% of the giant 
core size. 

If seniority is indeed a reason for having the break point 
and the "senior" nodes were added to the network at about 
the same time (in other words, they were the core of the 
original social network), then they would be strongly con- 
nected in a sense that there would be at least one other 
"senior" node in the immediate vicinity of any other "se- 
nior" node. The MKOSN analysis confirms the hypothesis: 
out of 5,900 "senior" nodes in the giant core, only 66 (1%) 
do not have any "senior" neighbors. Moreover, the sub- 
graph of the "senior" nodes is dense: most "senior" nodes 
have 8-10 "senior" neighbors, and the average number of 
"senior" members in a neighborhood of a "senior" node is 
19. 

Unfortunately, the absence of the dynamic data from 
MKOSN does not allow us to verify the hypothesis about 
the generational origin of the double Pareto distribution. 
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Path length in the network 

Figure 2: Path length in the network 



Path Length 

The second important macroscopic parameter of a graph 
is the distribution of node-to-node path lengths. The 
MKOSN claims that it is built around the principle of the 
"six degrees of separation." As it turns out (Figure [2],, 
there are on the order of 10 10 paths in the network. The 
longest path (the diameter of the network) is 14 hops long, 
which is more than twice the length of the legendary "six 
degrees." 

On the other hand, the mean path length is six hops: 
the majority of the nodes are indeed "six degrees" apart 
from one another. 

3. FINE STRUCTURE OF THE MKOSN 

The macroscopic parameters presented in Section [2] help 
us little to understand the internal structure of the social 
network. We propose three finer-grain approaches (two 
topological and one geometric) to the social network anal- 
ysis. 

Macroscopic topology 

One can see from Figure [Qthat 25% of the MKOSN nodes 
have the degree of 1. They are part of the giant core, 
but are loosely connected to the rest of the network. The 
corresponding network members have been apparently in- 
troduced to the MKOSN by their more active friends, but 
had neither time nor desire to expand their contact lists. 

A more careful analysis reveals that ss2% of the net- 
work nodes have the degree of two and are connected to 
at least one other node that has the degree of one or two. 
Such nodes form "tentacles" that expand from the denser 
part of the network "outward. 'jj Thus, a tentacle is a 
chain of network members, each of which is a "friend" 
of the next one and the previous one, except for the last 
marginal member (the loner), who has only one "friend." 
The tentacles' lengths are distributed exponentially with 
the mean length of 1 hop (Figure [3}- The exponential na- 
ture of the distribution suggests that the probability of a 
loner to add another "friend" is a constant. 

The cyberpsychological nature of the tentacles is cur- 
rently unclear. 



The word "outward" is quoted because so far there is no 
"out" direction in the network. 
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Figure 3: Path length in the tentacles 
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Path length in the network 

Figure 5: Path length in the dense core 



The nodes that do not belong to the tentacles form the 
dense core A (A C T). The size of the MKOSN dense core 
is 123 thousand nodes. 

It is tempting to check if the dense core is uniformly 
dense or it has "cavities," possibly crossed by thin "fibers." 
A fiber is very similar to a tentacle, except that the "loner" 
end of a fiber is connected back to the dense core (using 
the topological terminology, the dense core with a fiber is 
a sphere with a handle). The MKOSN has 42,000 fibers 
with an average length of 2 hops (Figure [4] shows the dis- 
tribution of the number of inner nodes in a fiber, which is 
one less the number of hops) . The lengths of the fibers are 
distributed exponentially, too. 

The fibers are probably formed when the loners add 
"friends" with the same constant probability as during the 
tentacle construction, but one of the newly added "friends" 
is already a member of the dense core. The cyberpsycho- 
logical nature of the fibers is probably the same as that of 
the tentacles. 

As the result of the macroscopic topological study, we 
identified the fine structure of the MKOSN (Figure©: the 
dense core that amounts to 74% of the giant core, the ten- 
tacles, and the fibers. Since the tentacles are very simple 
in nature, they can be eliminated from further consider- 
ation. The distribution of the node-to-node path lengths 
in the dense core (including the fibers, but excluding the 
tentacles) is shown in Figure [5] The mean path length in 



the dense core is five hops — one hop less than in the gi- 
ant core, meaning that the majority of "regular" network 
members are actually even closer to each other than the 
network claims. 

Mesoscopic Topology 

The macroscopic topological study does not address the 
structure of the dense core. In particular, we do not know 
if the core is uniformly dense and if it has inner and outer 
(boundary) nodes. The answer can be given by exploring 
the mesoscopic (medium-range) network topology. 

The idea of describing the mesoscopic topology of a so- 
cial network numerically is not new. An overview of major 
proposed mechanisms — cliques, n-clans, and k-plexes — is 
given in [3]. Our approach considers the social network 
as a continuous medium (which is somewhat acceptable if 
the number of network nodes is large) and is based on the 
density study. 

With no vector space associated with the network, we 
have to redefine the density so that the new definition does 
not depend on any vector properties. In particular, the 
new definition cannot use the concept of volume. The 
"classical" definition p{xo,, . ,Xd) — — ay" 1 " , where 
N is the number of nodes, is ruled out. 

We can compare a social network to a crowd of people. 
The crown is dense around an individual if the individ- 
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Fiber length 

Figure 4: Fiber length in the dense core 




Figure 6: The macroscopic structure of the MKOSN 
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Figure 7: Depth distribution in the dense core (larger 
depths correspond to the nodes that are closer to the 
"boundary" ) 



ual has many neighbors; otherwise, the crowd is sparse 
(technically speaking, it is not a crowd). Apparently, the 
nunrber of network neighbors (the node degree) of a node 
N can serve as a reasonable estimate of the social network's 
density in the vicinity of K: p (N) = degree (K). 

If the proposed definition of density is used, then the 
MKOSN is not uniform, and the density follows the dou- 
ble Pareto distribution that has already been discussed. 
However, the graph in Figure[T]does not reflect the spacial 
density distribution. 

To identify the "inner" and the "boundary" nodes of 
A, we will use the observation that in an spherically sym- 
metric D-dimensional object, the mean distance fa from 
any point P to all other points of the object reaches the 
minimum if P is at the center. Any displacement of P 
from the center increases fa, and the maximum is reached 
on the boundary of the object. In particular, if our object 
were a uniform sphere, then for the point that is at the 
distance r from the center, fa — a (D) \Jb 2 (D) + r 2 . 

We define the distance between two nodes P and Q in 
the dense core as the minimum length of the paths connect- 
ing P and Q. The depth of P, d (P) = fa (P), is the mean 
distance from P to all other nodes in A. The distribution 
of the depths in A is shown in Figure [7] The mean depth 
d — 5.5 equals the mean path length in the dense core (Fig- 
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Figure 8: Dense core density p as a function of depth d 
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ure[5]). It is reasonable to assume that "boundary" nodes 
lie on the right of the chart, and the "inner" nodes are in 
the middle and on the left. The figure shows that if our 
hypothesis about the spherical symmetry is correct, then 
the majority of the nodes are concentrated in the middle 
layer, with very few nodes on the true "boundary" and in 
the "center." Speaking in other words, the overwhelming 
majority of the dense core members are "average people," 
with very few marginal and socially popular members. 

Now we are ready to combine the "depth cues" and 
the density information and plot p against d (Figure [8]). It 
follows from the graph that the dense core is dense in the 
center and sparse at the outskirts: for the well-connected 
members (high-p) , it is easier to reach the rest of the dense 
core (low-d). 

By redefining the local density p(N), we can explore 
one more interesting aspect of the MKOSN. It is known 
that people are either socially active (socially popular, ex- 
troverts) or socially passive (marginals, introverts). A so- 
cially popular person has a lot of "friends," which are not 
necessarily popular by themselves — at least not as popular 
as the person himself or herself. We can identify socially 
popular members and marginals by comparing the density 
at K and in its neighborhood. 

Let e (N) be the set of all neighbors of K in A — the first 
circle of K. Then 
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E P(a) 

a€e(N) 

k(N)| 



(1) 



is the average density of the first circle. Let E/I = 
p e (N)/p(N). Then II = log 10 E/I is the quantitative 
personality — the measure of the social activity of a mem- 
ber. Positive II means that the "friends" of the mem- 
ber on average have more "friends" than the member, i.e., 
that member is socially passive, or a marginal. Negative 
LT identifies members that are socially popular and have 
more "friends" than the members of their first circles. 

The distribution of jT for the dense core of the MKOSN 
is shown in Figure [9] The distribution is skewed to the 
right: the marginals outnumber the popular members in 
the dense core by the factor of 9. The ratio is even larger 
for the entire MKOSN (10:1). 

Finally, we can draw some conclusions about the like- 
lihood of a high-jT (marginal) member to be in touch with 
a low-II (popular) member and the other way around. Ta- 
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Popular Neutral Marginal 


Popular 
Neutral 
Marginal 


26% 2% 72% 
24% 4% 72% 
43% 3% 53% 



Table 1: Personalities in the MKOSN 

ble[l] shows the fraction of marginal, neutral, and popular 
"friends" in the first circle of a MKOSN member. 

The table suggests that the popular and neutral net- 
work members tend to cluster with the more marginal (or 
less popular) members, while the marginals do not have 
clear preferences. 

4. GEOMETRY 

The topological studies of online social networks focus on 
defining their structure, as well as the boundaries and the 
inner areas. However, they do not consider the positions 
of the nodes (network members) . 

It is tempting to construct a multidimensional vector 
space (perhaps even a linear space) that has the social 
network nodes as points. The coordinates of the nodes 
in such space may be related to the social properties of 
the underlying network or to the psychological properties 
of the network members. In this section, we attempt to 
elaborate the geometry of the MKOSN. 

The graph of a social network induces a discrete metric 
space M, where the distance between two points (nodes), 
P and Q, is the minimum length of the paths connecting 
P and Q. The metric function d m (P, Q) is implicitly de- 
fined. We want to embed this space into a D-dimensional 
vector space with minimal distortion. In other words, 
we want to assign D-tuples of coordinates to each point 
P in M: P — » Xp = (x Pl x P , . . . ^Xp' 1 ) — and a met- 
ric function d v {X, Y) so that VP,Q £ M : P - Q -» 
\d m (P, Q) /dv (Xp, Yq) - 1| < e. The value of e is called 
the distortion of the embedding [J] and ideally should be 
infinitely small. 

As a first approximation, we propose to use the follow- 
ing vectorization procedure: Let us enumerate all nodes, 
and let Mi be the node number i in M. Then for a node 
P £ M, let x P = dm(P, Mi). In other words, the i'th co- 
ordinate of P is the distance from P to the i'th node (we 
call the i'th node a reference node, or reference point). In 
particular, xp = <=>■ P — Mi. 

The metric function is based on the Chebyshev dis- 
tance: 

d v (X,Y) = m^^\x' P ~y Q \). (2) 

It is not hard to see that VP, Q : d m (P, Q) — d v (Xp, Yq). 
Thus, the newly constructed space is a non-distorting em- 
bedding of M. 

Unfortunately, the dimensionality of the new space is 
too high: even for the relatively small MKOSN, there are 
166,000 dimensions, which is probably well beyond any 
practical use. We will use the modification of the Quine- 
McCluskey method [6] to reduce the dimensionality. 

In any general network, some of the dimensions are de- 
pendent. For example, in a standalone tentacle it's enough 
to define one linear coordinate, no matter how long the 
tentacle is. Several dimensions are dependent if removing 
all of them but one does not change the metric function 
on M. 



Let's form matrix Z[D (D — 1) /2 x D] such that for 
i > j: 



if |4 - 41 < max(0,d(Pi,P,) - T), 

1 if |4 - x{\ > max(0, d(Pi, Pj) - T), 



(3) 



where tolerance T > 0. T=0 gives the exact solution. 
A column Zk in Z corresponds to the point Pk- We 



say that Zk covers row 



if z. 



1. If column Zk covers 



row z 13 , then it is an essential implicant: removing point 
Pk from the set of reference points distorts the distance 
between Pi and Pj by more than T hops. 

If a point is an essential implicant, we add it to the 
set E of essential implicants and remove all rows from Z 
the are covered by Zk- The procedure is repeated until the 
matrix Z has no more rows. The points that are not in the 
set E are not the reference points of the new vector space. 

In our experiments, we could easily reduce the number 
of dimensions from 5 to 2 with no distortion and from 5 to 1 
with the distortion of 25% (T=l). Unfortunately, the stor- 
age complexity of the proposed algorithm (O (D 3 )) makes 
it hard to use even for modestly sized social networks. 

5. CONCLUSION 

In the paper we investigated several topological and geo- 
metric approaches to the structural studies of online social 
networks in general and of the "Moi Krug" OSN in partic- 
ular. We introduced the concepts of the dense core and the 
local density and analyzed the density distribution within 
the dense core. We further used the density to identify 
socially popular and socially marginal network members. 
We attempted to embed the topological metric space in- 
duced by the social network graph into a vector space and 
concluded that the embedding is either impractical due to 
the huge dimensionality of the resulting space or is com- 
putationally expensive and still not very practical. We 
conclude that the exploratory mechanisms based on topo- 
logical properties are promising and preferable over the 
geometric mechanisms. 

A APPENDIX: NETWORK ACQUISITION 

Due to the fact that the administration of MKOSN does 
not disclose the total number of the participants, and the 
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Figure 10: Number of discovered (but not processed) nodes 
D vs the number of processed nodes P: experimental data 
and an empirical graph 
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Figure 11: Estimated and actual network size at various 
stages of the network acquisition 



"rumor-based" estimates variate from 80,000 to 400,000 
members, it was important to develop a mechanism that 
would allow us to learn the approximate network size in 
advance. In case of a large number, we would have to limit 
our research to a subset of the network, rather than to the 
entire network. 

The total size of the network S is the sum of the num- 
ber P of nodes that have been already discovered and pro- 
cessed, the number D of nodes that have been discovered, 
but not processed, and the unknown number X of nodes 
that have not been seen yet: 



S=P+D+X 



(4) 



The processed nodes can be compared to the inte- 
rior of the explored subspace C e of the space C, while 
the unprocessed nodes can be compared to its boundary. 
Processing AP nodes leads to the discovery of AD new 
nodes by following the links to these nodes (on average, 
L "external" links per node), and to simultaneously mov- 
ing AP nodes from the set of discovered nodes into the 
set of processed nodes: AD = +ZAP — AP. Therefore, 
L = (AD/AP + 1) « D' + 1. Intuitively, when P > X, 
then X ^ DL^ D(D' + 1), and Q becomes: 



S « P + D + (D 1 + 1) D 



(5) 



Assuming that ideally S should not depend on P, i.e., 
S" = 0, Eq. © can be rewritten as a second-order differ- 
ential equation: 



DD" + (D' + 1) 







(6) 



This equation does not have a closed-form solution. In 
our case, the acquisition curve can be very closely approx- 
imated by the following fractional-rational function (Fig- 
ure [10]): 



D = a P- 



■ a\P + a 2 



(J) 



P 2 +a :i P + a 4 

This function also closely matches a corresponding nu 
merical solution of ([6]). 



By combining the experimental values of P and D and 
the evaluated value of D', we can estimate the total net- 
work size at the early stages of network acquisition (Fig- 
ure [TJJ. The difference between the predicted and actual 
sizes is less than 10% for P > 35, 000, which means that 
the network size can be estimated fairly well after the ac- 
quisition of only 20% of the nodes. 
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